General Disclaimer 


One or more of the Following Statements may affect this Document 


• This document has been reproduced from the best copy furnished by the 
organizational source. It is being released in the interest of making available as 
much information as possible. 


• This document may contain data, which exceeds the sheet parameters. It was 
furnished in this condition by the organizational source and is the best copy 
available. 


• This document may contain tone-on-tone or color graphs, charts and/or pictures, 
which have been reproduced in black and white. 


• This document is paginated as submitted by the original source. 


• Portions of this document are not fully legible due to the historical nature of some 
of the material. However, it is the best reproduction available from the original 
submission. 


Produced by the NASA Center for Aerospace Information (CASI) 






9 w- 


(NAS A-Trt -84 Ub) Tiifc hki LEluH CUhVE AS A 
MJUEL FO h EFFORT D1STHIBUTIUN UVEH THE LIFE 
OF MEDIUM SCALE SOFTMAkE SISitMS M.o. 

Tbeaii! - Maryland Utav. (NASA) 154 p 

he AUd/MF AJ 1 CSlL O'lb Ui/fol 


t • 


NUJ- 1 Jd40 


Uncla s 
0 2 1 dd 


* ' 


ft.. 


. 1 




HJV 




■ - 


t 




• * 


M. 


♦ 

*4 


' V 


.. 




/ • .« 


_ ' 

s 


■■ r 


. 


t ■ 


j 


> 


mam 



A/Tir 


SOFTWARE ENGINEERING LABORATORY SERIES 


THE RAYLEIGH CURVE AS A 
MODEL FOR EFFORT 
DISTRIBUTION OVER THE LIFE 
OF MEDIUM SCALE 
SOFTWARE SYSTEMS 


DECEMBER 1961 


National Aeronautics and 
Space Administration 

Goddard Space Flight Center 

Greenbeti Vary and 20"! 


FOREWORD 


The Software Engineering Laboratory (SEL) is an organization 
sponsored by the National Aeronautics and Space Administra- 
tion, Goddard Space Flight Center (NASA/GSFC) and created 
for the purpose of investigating t£e effectiveness of 
software engineering technologies when applied to the 
development of applications software. The SEL was created 
in 1977 and has three primary organizational members: 

NASA/GSFC (Systems Development and Analysis Branch) 

The University of Maryland (Computer Sciences Department) 
Computer Sciences Corporation (Flight Systems Operation) 

The goals of the SEL are (1) to understand the software de- 
velopment process in the GSFC environment? (2) to measure 
the effect of various methodologies, tools, and models on 
this process? and (3) to identify and then to apply success- 
ful development practices. The activities, findings, and 
recommendations of the SEL are recorded in the Software En- 
gineering Laboratory Series, a continuing series of reports 
that includes this docur . A version of this document was" 
originally drafted as a thesis in December 1981, and was 
also issued as University of Maryland Technical Report 
TR-1186. 

The primary contributor to this document was 

Gino 0. Piccasso (University of Maryland) 

Other contributors include 

Victor Basili (University of Maryland) 

Single copies of this document can be obtained by writing to 

Frank E. McGarry 
Code 582.1 
NASA/GSFC 

Greenbelt, Maryland 20771 

• » 

li 


Technical Report TR-1186 


July 1982 
NSG-5123 


THE RAYLEIGH CURVE AS A MODEL FOR 
EFFORT DISTRIBUTION OVER THE LIFE 
OF MEDIUM SCALE SOFTWARE SYSTEMS* 


Gino Picasso 

Department of Computer Science 
University of Maryland 
College Park, MD 20742 


Thesis submitted to the Faculty of the Graduate School 
of the University of Maryland In partial fulfillment 
of the requirements for the degree of 
Master of Science 
1982 


‘Research supported in part by National Aeronautics and Space Administra- 
tion grant NSG-5123 to the University of Maryland. Computer time supported 
in part through the facilities of the Computer Science Center of the 
University of Maryland. 






ORIGINAL PAGE 19 
OF POOR QUAiriY 


u 


ABSTRACT 


Title of Thesis The Rayleigh Curve as a Model of Effort Distri- 
bution Over the Life of Medium Scale Software 
Systems 


r# 


NS 


Putnam has shown that the Rayleigh curve is an adequate 
model for the life-cycle effort distribution of large scale sys- 
tems, Previous investigations into the applicability of this 
model to mpdium scale software development efforts have met with 
mixed results. Th % results of these investigations are confirmed 
by analyses of runs and smoothing. The reasons for the models* 
failure are found in the subcycle effort data. There are four 
contributing factors: uniqueness of the environment studied, the 
influence of holidays, varying management techniques and differ- 
ences in the data studied. 
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1. INTRODUCTION 

Putnam has claimed that the Rayleigh equation accurately models 
the software life-cycle effort distribution of large projects. 
(Putnaml Putnam2, Putnam3, Putnam4 J He uses the derivative form 
of this equation — y* * 2*K*A*fc*«kP (»a*t**2) , to predict the 
man-power distribution over the life of a software system. This 
equation is fully determined by the parameters K and t. K is the 
total effort expended and td is the time to development or the 
time to peak effort (i.e., t ■ td at the curve peak), a relates 
to td by the formula a * l/(2*td**2). The two parameters K and 
td can be estimated using Bayesian inference on the data gathered 
from previous programs. 

Prom the Rayleigh curve Putnam derives several project 
parameters which help classify the system. These include diffi- 
culty, the state of technology of a software house (roughly a 
measure of its ability to do software development), and produc- 
tivity. When these parameters have been determined for an instal- 
lation and a particular system, feasibility regions for software 
development for the installation can.be derived. Time-cost tra- 
deoff curves can be drawn which management can use in decision 
making. Predictions of software size can also be made. 

Putnam has also claimed that the individual subcycle curves 
of design and code, and test follow the Rayleigh curve. Putnam 
has indicated that the Individual subcycle effort distributions 
when taken together and added result In the a Rayleigh shaped 
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project profile curve. 

The Rayleigh model is very appealing because of its simpl!** 
city, management's familiarity with the parameters that determine 
the equation and the practical aids it provides for decision mak- 
ing. For this reason and because the Rayleigh curve is an ade- 
quate model for large scale software effort distributions, the 
SEL (Software Engineering Laboratory at the University of Mary- 
land) chose to study the Rayleigh curve as a model of medium 
scale software effort distributions. 

Basil i and zelkowitz [B-Z] studied the applicability of the 
Rayleigh model to medium scale software efforts. They tried 
using the model to predict total effort, maximum effort and time 
to acceptance testing. Mapp [M] continued this investigation. In 
addition, Mapp also compared the Rayleigh curve to other curves 
to determine whether or not the Rayleigh curve was indeed the 
underlying man-power curve for medium scale systems. Basili and 
Beane [BB1] compared the Rayleigh curve to the model proposed by 
Francis Parr [P] to determine which curve best described the 
man-power distribution of the smaller systems being studied. 
Basili and Beane checked to see whether or not the contractor's 
rule of thumb algorithm for project manning was being followed. 
[BBU 

All these efforts have given mixed results about the appli- 
cability of the Rayleigh curve to medium scale development 
efforts. Basili and Beane did indicate that the contractor's 


3 


algorithm was being used as a rough guideline by the managers. 
These results do not invalidate the Rayleigh curve as an optimal 
manning curve, but it cannot be clearly seen whether or not the 
Rayleigh curve is an adequate model for the man-power distribu- 
tion of these development efforts. 

In what follows, a more thorough investigation of the appli- 
cability of the Rayleigh curve is carried out. First the work 
done by previous investigators will be extended to determine 
whether or not the supposition that the Rayleigh curve fits the 
man-power distribution of medium scale systems is true. Trends in 
the data are studied to explain the deviations from the Rayleigh 
curve. These are looked into further by studying the effort dis- 
tribution over subcycles. The possibility of using the Rayleigh 
curve to classify these systems is explored. Finally, other rela- 
tions in the data are examined to try to find any invariants 
which may aid in smoothing and better understand effort dis- 
tributions. One smoothing technique is used to elucidate the 
basic trends in the data. 

A description of the data used for the previous studies done 
on medium scale systems is given first. A brief description of 
the work which has gone before and the conclusions they led to 
are also given. This is done in order to lay the foundation from 
which the rest of the study will be conducted. 


2. DESCRIPTION OP THE DATA EMPLOYED 


It will be helpful at this point to discuss the data used for 
this paper and in the work done by the previous researchers in 
SEL. [Basili, et. al.] gives a more thorough explanation of the 
forms used to collect the data. 

The projects studied were primarily attitude control pro- 
grams ranging in size between 45000 to 112000 lines of code and 
taking between 10000 to 24000 manhours to develop. The programs 
were developed by the Computer Science Corporation (CSC) for the 
National Aeronautics and Space Administration (NASA). The data 
■W&s collected by the Software Engineering Laboratory (SEL) con- 
ducted jointly by NASA, CSC and the University of Maryland Com- 
puter Science Department. 

Two forms were used to gather the effort data under study. 
The Resource Summary (RS) form was used for the studies mentioned 
earlier. It consists primarily of accounting information. The 
form is filled out at tne end of each week by management and 
represents the actual charges made to the project. It contains 
the number of hours charged to the project by individual program- 
mers, managers and support personnel for each week of the pro- 
ject. The Component Status Report (CSR) form, from which this 
study draws much of its information, consists of the actual 
number of hours spent by programmers on system development. Data 
is available on a weekly basis by component and phase. Each pro- 
ject is divided into three phases: design, code and test. Each of 


these phases is divided into three subphases. Activities which do 
not fit into these categories are reported under miscellaneous 
charges which makes up a separate category. The effort data is 
only available from the start of design through acceptance test- 
ing . 


The number of hours reported on the CSR forms are generally 
lower than the number of hours reported in the RS forms. This is 
to be expected since the data on the CSR forms represent the 
actual number of hours worked on a project, whereas, RS data 
represents the number of hours charged to a project. These do not 
necessarily match because overhead is incurred in hours not 
directly spent on development activities. The CSR forms probably 
reflect the actual number of hours programmers spent on a project 
more accurately than the RS forms. But, the accuracy of the CSR 
data is somewhat suspect. The CSR forms are filled out by many 
people (each individual involved on the project) resulting in 
reporting inconsistencies. The RS forms, on the other hand, were 
filled out by project managers (only one or two people filling 
out the form per project) thus making them more consistent. 
Furthermore, for a couple of projects, the CSR data for the early 
stages of the project is missing because the forms had not yet 
been made available. 

The RS form, in so much as it consists of budget informa- 
tion, includes the total number of hours charged to the project 
or the total weekly effort expended by all personnel assigned to 
the project. The CSR form reports only effort expended in 
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particular components of the system and represents the number of 
hours directly expended in development. If the total number of 
hours reported in the CSR form for each week are added, what is 
obtained is the total effort expended directly on development 
activities with little overhead. In this paper the total weekly 
effort and the total weekly development effort are differen- 
tiated. The total weekly effort represents the effort reported 
on the RS forms and includes all charges made to the project 
including all overhead. The total weekly development effort is 
obtained from the CSR forms and represents the effort expended 
directly on development of a particular components in the system 
with very little overhead. 

The first portion of this paper will center around the 
analysis of the data from the RS forms, the total weely effort. 
This is done as a follow on to the work done by previous investi- 
gators in SEL. The second portion is a study of individual phase 
effort distribution and how these relate to the total weekly 
effort distribution. Also the relation between components and 
effort is investigated. CSR forms are used to obtain this data. 


3. EARLY WORK 


The work presented here was done as a continuation of the stu- 
dies conducted by the SEL it the University of Maryland. Much of 
the work has focused on three aspects of project manning and 
effort distribution models. First, effort distribution models 
have been used to predict the values of three principle parame- 
ters: total effort (K) , peak effort or maximum man-power require- 
ments (yd) , and time to acceptance testing (ta) . Secondly, the 
effort data has been studied to determine the underlying man- 
power patterns followed when developing medium scale software 
systems. This consisted in fitting various curve types to the 
effort data over time and comparing how well these modeled the 
effort distribution. . Most recently the manning algorithm used by 
management has been checked against the actual effort distribu- 
tion data. This has been done in order to determine how closely 
management actually adheres to their own "rule of thumb" for pro- 
ject staffing. 

Basili and Zelkowitz were the first to study the applicabil- 
ity of the Rayleigh curve to medium scale development efforts. 
The data available to them did not match the data studied by Put- 
nam. It die’ not include the early effort spent on requirements 
definition and the later effort spent on maintenance. Putnam had 
observed, however, that for large projects the design/code and 
test subcycles were Rayleigh in shape and that their sums were 

t 

also Rayleigh. Basili and Zelkowitz assumed this to be true of 
medium scale development efforts as well. They reasoned that the 
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major central portion of the Rayleigh curve for the project pro- 
file should fit the design, code and test data well. Design, code 
and test subcycles are Rayleigh and their sums are too. Using the 
Rayleigh equation, they derived equations for the three quanti- 
ties: ta, yd, and K. The equations which they obtained were: 

ta « 1. 25*K/yd 

yd » 1 . 25*K/ta 

K « .80*ta*yd 

These three parameters are estimated at the beginning of the pro- 
ject. Taking two of the parameter estimates the third value was 
calculated using these equations. The predictions of time to 
acceptance were very good (3% error). This was a better estimate 
than management had given. Only two projects were used in this 
study however. The estimates obtained for the other two parame- 
ters were not as good. 

Mapp derived a separate set of equations from the Rayleigh 
curve using a shaping factor, a * l/{td**2). The equations he 
obtained were: 

ta - 1.07*K/yd 

yd p 1 . 07*K/ta 

K - ,93*ta*yd 
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Using the same procedure used by Basili and Zelkowitz r he 
obtained even better estimates for time to acceptance then they 
had. 


Each of these equations determine a Rayleigh curve. When the 
Rayleigh curves corresponding to these six equations were 
obtained using management estimates of ta, yd and K, it was found 
that these did not seem to fit the data well. In fact, these 
curves were not even the best fitting Rayleigh curves. Since the 
Rayleigh curves which were responsible for the predictions did 
not fit the data well, it did not seem that the Rayleigh model 
was responsible for accurate predictions of time to acceptance. 

The estimates obtained from the Parr curve, principally ta , 
were not any better than the Rayleigh curve predictions. It 
should also be noted that the parameters that determine the Parr 
curve are much more difficult to determine than the parameters 
that determine the Rayleigh curve. The results of these studies 
were inconclusive. 

Attempts to find the underlying curve for man-power distri- 
bution were made. These consisted of fitting various curve types 
to data. Three separate efforts were made. Basili and Zelkowitz 
linearized the Rayleigh equation and did a least squares fit to 
the data. Mapp used the least squares method used by Basili and 
Zelkowitz and a simplified search method using the sum of errors 
squared as an optimization criteria to fit four curve types. He 
fitted the Rayleigh curve* a parabola, a trapezoid and a straight 
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line-. Basil! and Beane used Newton's method and the search method 
used by Mapp to fit a three parameter version of the Parr curve, 
a Rayleigh curve with a horizontal shift, a parabola and a tra- 
pezoid. 

The three parameter Parr curve resulted in the best fit 
but, it was not significantly better than this other curves. 
Therefore it could not be concluded that the Parr Curve was the 
best model. Basil! and Beane supposed that the fluctuations 
present in the data made it impossible to determine the best fit- 
ting curve. Basil! and Zelkowitz had made a similar observation 
earlier. In addition, Basil! and Zelkowitz observed that because 
medium scale projects assume more of a step function man-loading 
curve it was difficult to determine where peak effort actually 
occurred. Where this peak is chosen to be makes a significant 
difference in the shape of the curve which is obtained. 

Basil! and Beane conducted a third type of study. They com- 
pared a "rule of thumb" staffing algorithm said to be used by 
management to the actual effort distribution data. This allowed 
them to determine whether or not management was indeed adhering 
to their "rule of thumb." The algorithm proposed by management 
was as follows: 

1. At the start of the project assign 1/2 to 3/4 "full staff- 
ing" (due to lack of early funding and problems in finding' 
available people). 
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2. At the end of the design phase, plus or minus a month, build 
to full staffing. 

3. During the coding phase maintain full staffing. 

4. During the testing phase: 

a. if the project is on schedule, decrease manning as 
appropriate. 

b. if the project is behind work, work overtime. 

c. if there are late changes to the user requirements 
increase manning by an additional 1/3. 

As Basili and Beane pointed out, this algorithm would indicate 
that management has a great deal of flexibility in terms of 
staffing to handle problems when they arise. When the algorithm 
was checked against the data, it was seen that the data 
corresponded fairly well to the algorithm. Basili and Beane con- 
cluded that the hypothesis that management was indeed using this 
staffing algorithm could not be rejected. 

These results do not favor adopting the Rayleigh curve as a 
model for medium scale development effort distribution. But, 
none of the studies have been conclusive and further investiga- 
tion is warranted. In the following sections, the results of 
these investigations will be analyzed and extended. Explanations 
of the findings will be sought in the subcycle effort data. The 
assumptions made will also be checked. In addition, the Rayleigh 


model will be used to classify projects In terms o£ difficulty 
and the results will be compared to management's ranking of these 
systems. The contractor's algorithm will also be reviewed in 
light of the Rayleigh curve and the results of the studies con- 
ducted. 
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4. ANALYSIS OF TOTAL WEEKLY EFFORT 

In this portion of the paper the work done by previous investiga- 
tors will be extended to determine whether or not the supposition 
that the Rayleigh curve fits the total weekly effort distribution 
is true. 

First, the results of using the model to predict project 
parameters are studied further since previous results have been 
inconclusive. Factors which support or refute the Rayleigh model 
are sought. The Rayleigh curve is used to size projects. If the 
curve can be used in this fashioh, this would lend support to the 
model. Secondly, curve fitting is attempted. If the Rayleigh 
curve fits the data well and it is a better fit than other 
'curves, it would make it a likely model. Finally, an attempt at 
finding general trends in the weekly effort distribution is made 
to see what similarities can be found with the Rayleigh curve. 

4.1 Predictions of ta, yd, and K 

The equations used by Basil! and Zelkowitz and Mapp were accurate 
in predicting time to acceptance testing (ta) . However, even 
though these equations were derived from the Rayleigh equation it 
is not necessarily true that the accurate predictions are due to 
the Rayleigh model. In the first place, the predictions of the 
other parameters— maximum manning (yd) and total effort (K) , 
were not as good as ta. Also, the Rayleigh curves resulting from 
these predictions did not fit the data well and in fact were not 
the best fitting curves. Further evidence that accurate 
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predictions are coincidental rather than a product of the Ray- 
leigh model is obtained by comparing the two sets of equations 
used: Basilic and Mapp's. 

First it must be noted that the equations used by Mapp were 
not directly derived from the Rayleigh equation. Mapp used a 
shaping factor given by the expression a»l/(td**2). This is not 
the shaping factor defined by the Rayleigh equation. The shape 
factor (a) is obtained directiy from the Rayleigh equation as 
follows. 

y * * 2*K*a*t*exp(-a*t**2) 

y" ■ yd'/td ■ 2*K*a*exp(-a*t**2) * (l-2*a*t**2) 
a - 1/ (2*td**2) 

Since the shape parameter is defined by the Rayleigh curve, the 
equations used by Mapp are not really derived from the Rayleigh 
curve. Yet the equations used by Mapp gave predictions of ta at 
least as good as Basil! and Zelkowitz 1 . The results are summar- 
ized in Table 1. 
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i I I BASILI 6 1 1 1 

| | MANAGEMENT! ZELKOWITZ \ MAPP I ACTUAL | 

| | ESTIMATES I ESTIMATES | ESTIMATES | DATA j 

! 1 1 I i l 

IPROJ 

1 

ta 


48 

I 50 

1 60 

I 60.8 




yd 

1 

280 

| 406 

1 349 

1 435 




K 

1 

15600 

I 10752 

I 12508 

1 20302 
1 


IPROJ 

2 

ta 

1 

39 

t 58 

1 50 

1 47.8 

1 1 



yd 

1 

280 

1 417 

1 356 

1 509 




K 

I 

1 

13000 

I 8736 
1 

1 10189 

1 16762 
1 


IPROJ 

3 

ta 


"46 

1 63 

' I 54 

1 54 J 5 




yd 

1 

240 

1 330 

1 285 

I 340 




K 

1 

l 

12133 

1 8832 

.1 10228 

1 13288 
1 


IPROJ 

4 

ta 


48 

|~£2 

1 53 

| 60.8 




yd 

1 

280 

1 361 

1 310 

1 489 




K 

1 

1 

13867 

I 9408 
1 

l 12508 
1 

1 14006 
1 


IPROJ 

8 

ta 


13 

1 16 

1 14 

1 29 




yd 

i 

40 

I 50 

1 43 

1 99 




K 

1 

1 

520 

1 384 
1 

I 484 
1 

I 947 


IPROJ 

9 

ta 


65 

1 <j3 

1 54 

| *** 




yd 

1 

120 

1 117 

1 97 

1 148 




K 

1 

1 

6067 

I 5460 
1 

I 7312 

1 4963 
1 



Table 1« Project Data and Rayleigh Predictions 
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Since the equations dtrived from the Rayleigh equation do not 
give better predictions than those that are not, the Rayleigh 
curve would not seem to be responsible for the accurate predict 
tions of ta. The fact that time to acceptance is predicted accu- 
rately would seem coincidental. Whether or not the Rayleigh model 
is responsible for the results obtained does not invalidate these 
findings. For the SEL environment these equations seem to work 
well. However, the model cannot be validated in this manner. 

4.2 Curve Fitting and Smoothing of Total Weekly Effort 

In this subsection, the work done by Mapp, Basil! and Beane to 
find the best fitting curves for the total weekly effort data is 
extended to determine if a more definite conclusion can be drawn 
from their work. The work these researchers have done has been 
based on the supposition that if the Rayleigh curve is indeed the 
underlying man-power curve, then it would also be the best fit- 
ting curve. Their results have been mixed. They were not able to 
tell which curve best fit the data. An analysis of runs is used 
to see whether or not a best fitting curve can be selected from 
the set studied. Data smoothing is used to evaluate the fits and 
see whether or not better fits can be obtained. 

A time sequence plot of residuals for each of the fitted 
curves obtained by Mapp and by Basil i and Beane was made. An 
analysis of runs was performed to measure the goodness of fit of 
the calculated curves. The residuals were obtained by taking the 
difference between the effort in man-hours expended in week t 


(the data) and the distribution curve evaluated at t. The 
analysis of runs consisted of the following. Assuming that the 
data is randomly distributed about the fitted curve, then it 
would be expected that there would be an approximately equal 
number of positive and negative residuals. For example, if there 
are 50 observations (data points) and the first 25 residuals are 
positive and the rest are negative, it is not likely that the 
data points are randomly distributed about the fitted curve. It 
is on this concept that the analysis of runs is based. A run is 
simply a grouping of either positive or negative residuals. In 
the example just given there are two runs. If the set of residu- 
als exhibit the following pattern of signs, (-—++—-++<—), then 
there are a total of five runs. 

The number of runs should increase as the number of observa- 
tions increases. If this is not the case, then it is unlikely 
that the points are randomly distributed about the fitted curve. 
Therefore the number of runs as a function of number of observa- 
tions can act as a measure of goodness of fit. The question to be 
asked is: assuming the data points are randomly distributed about 
the fitted curve, what is the probability of getting this number 
of runs given nl+n2 observations, where nl and n2 are the number 
of positive and negative residuals. 

To obtain this probability a, normal approximation to the 

actual discrete distribution was used. (*) The mean and the 

(*i the discrete distribution referred to is defined as follows: 
given x number of seta of data points, each set containing 
nl<tn2 points randomly distributed about a fitted curve, then 
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variance of the normal population was used to estimate the actual 
mean and variance. The mean and the variance of this distribution 
is given by: 

mean ■ (2*nl*n2)/(nl+n2)+l 

var -(2*nl*n2)*<2*nl*n2-nl-n2)/((nl+n2)**2)*<nl+n2-l) 

The unit normal deviate (**) as given by: 

z - (number of runs - mean+.5)/sqrt(var) 

The results of the analysis are tabulated in Tables 2A and 2B. 
The number of positive and negative residuals, the number of runs 
and the unit normal deviate for each of the curves obtained by 
Mapp and by Basil! and Beane are given. 


each set will have some number of runs (Ri, i* 1,X) 
associated with it. The set of all Ri will form a discrete 
distribution about the expected value of R (number of runs) 
given nl+n2 data points. Since the data points are randomly 
distributed about the fitted curve, the discrete distribution 
formed by the set of Ri, should be approximated by a normal 
distribution. This mean of this normal distribution will be 
approximately equal to expected value (R) of this 
distribution. 

(*+) The unit normal deviate is an approximation of the standard 
deviation for the discrete distribution. The unit normal 
deviate is calculated using the mean .and variance of the 
normal distribution which only approximate the mean and 
variance of th«i actual discrete distribution. 
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Assuming a normal distribution the probability that the 
value of Z is less than -1 is .16 (P(Z<-J.) *.16) • Evaluating the 
results in Tables 2A and 2B we can conclude with some confidence 
that the pattern of residuals is not random. This indicates that 
effort data is not randomly distributed about any of the fitted 
curves. This fact together with the evaluation of the sum of 
least squares obtained by Basili and Maipp indicates that none of 
the fitted curves are underlying curves for the total weekly 
effort data. The plots of the two best fitting Rayleigh curves 
(Projects 1 and 4) are given in Figures 1 and 3. 

The data analyzed included manager and support personnel 
hours as well as programmer hours. In order to determine whether 
or not manager and support personnel hours could be responsible 
for introducing the deviations from the Rayleigh curve shape, 
programmer hours were isolated, plotted and fitted. The fitted 
curves did not seem to fit the programmer effort distribution any 
better. 

Basili and Zelkowitz had observed that the total weekly 

{ 

effort data had contained a lot of noise, which was responsible 
for the deviations. The data was smoothed to determine to what 
extent noise in the data was responsible for the deviations from 
the Rayleigh curve. The smoothing was done by calculating a run- 
ning average of five week intervals; the effort for the two weeks 
before and the two weeks after were averaged with the effort for 
the current week. This was done for each of the projects. The 
plots fof projects 1 end 2 for the smoothed data are given in 
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figures 2 and 4. The best fitting Rayleigh curves were calculated 
for seven projects. 
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The smoothing revealed that the data did not follow the Ray- 
leigh curve. A simple visual inspection o£ the plots demonstrate 
this. An analysis of runs in this context is not meaningful since 
the data has undergone transformation and therefore is no longer 
random. 

The same analysis was carried out for total weekly program- 
mer hours and similar conclusions were drawn. The plots for pro- 
ject 2 are given in figures S and 6. 
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It is seen that a choice among the curves used by Mapp and 
Basil! and Beane is impossible. None o£ the curves fit the data 
well enough. Two things give us evidence of this. First, a choice 
among the curves on the basis of the error criteria used (sum of 
errors squared) was not possible. This was observed by both of 
these researchers. Second, the analysir of runs indicate that 
the data points are not randomly distributed about the fitted 
curves. The sum of errors squared indicate chat the fits are not 
good and the analysis of runs confirms this result. 

The Rayleigh curve is very definitely not an adequate model 
for the total weekly effort data studied, but no other curve 
seems to do any better. In fact, it is impossible to state 
whether or not there is a single curve type which best fits all 
or even the majority of the projects studied. Attention is now 
turned to studying any trends in the data. 

4.3 General Trends in the Total Weekly Effort Distribution 

In this subsection, an attempt is made to find general trends in 
the data to see what similarities can be found with the Rayleigh 
curve. If the trends in the data can be explained and the devia- 
tions from the Rayleigh curve can be accounted for, then further 
study of the Rayleigh curve may help explain other behavior in 
the effort distribution of medium scale software projects. The 
data is smoothed to make the trends easier to identify. 

Two major trends are observed. First, every project exhi- 
bited several peaks in total effort expended. This is in contrast 
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to the single peak representing maximum manning exhibited by the 
Rayleigh curve. Second, the "more successful" (*) projects all 

seemed to have the same man-power pattern - a quick rise, a 

» 

series of peaks, followed by a steady decline. The "less success- 
ful" projects did not exhibit this behavior. The Rayleigh curved 
suggests a rise, peaking and exponential tail off pattern for 
"successful" projects. 

The first of these trends could be explained by the 
occurrence of holidays. Basili and Zelkowitz had noted that a 
significant and very noticeable decline in effort occurred during 
the holidays. For project 2 (figures 3 and 4), Christmas and Mew 
Years occurred on weeks 12 and 13, Easter in week 28 and Indepen- 
dence Day fell m>> week 39. These holidays match up exactly with 
effort slow downs observed in the data. The same observations 
were made for project 1 (figures 1 and 2) , and for the five other 
projects studied. The slow downs can therefore be reasonably 
attributed to employees taking holidays. This acts as noise in 
the data. 

The reason that Putnam did not observe any effect from holi- 
days in the large projects he studied is because in large pro- 
jects effort expenditure is gathered on a monthly or yearly basis 
instead of a weekly basis. This causes the effects of the holi- 
days to go unnoticed. Since the presence of multiple peaks in the 
effort data can be explained by the "noise" due to holidays, it 

• Classification of projects as more or less "successful” is 
based on a subjective evaluation made by management. 
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can be concluded that the underlying curve fur thia nan -power 
data should have a single peak* 

The second observation concerns the general shape of the 
man-power curves for projects regarded as "more successful" by 
management* It was observed that the "more successful” projects 
exhibited man-power patterns characterized by a rise in effort, 
followed by several peaks and a steady decline* In contrast, a 
"less successful" project exhibited a slower man-power build-up, 
peaking Very close to delivery, followed by a very sharp decline* 
(This can be explained by the need to finish quickly* An attempt 
is made to deliver a project on time by adding more manpower*) If 
the noise due to holidays is smoothed, the "more successful" pro- 
jects would exhibit a man-power pattern of rise, peak and 
decline. The behavior characteristic of the "successful" projects 
is exhibited by both projects 1 and 2 (figures 1-4), while pro- 
ject 4 (figures 7-8) is an example of a "less successful" pro- 
ject* 


FIGURE 7. PROJECT 3 — EFFOk 

WEEKLY TOTALS 
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These examples are Insufficient to support any conclusion. 
But Putnam has made similar observations for the projects he stu- 
died. Putnam states , 


"Many of these also exhibited the same basic man-power 
pattern- a rise, peaking and exponential tail off as a 
function of time. Not all systems follow this pattern. 
Borne man-power patterns are nearly rectangular; that is, 
a step increase to peak effort and a nearly steady effort 
thereafter. There is reason for these differences. It is 
because man-power is applied and controlled by manage- 
ment. Management may choose to apply it in a manner which 
is suboptimal or contrary to system requirements. Usu- 
ally, management adapts to system signals, but generally 
responds late because the signal is not clear instantane- 
ous with the need.” (Putnaml, pg.348) 

» 


This suggests that the optimal manloading curve follows a pattern 
similar to that suggested by the "more successful" projects. Pro- 
ject 3 (figures 9 and 10) which was considered a "successful” 
project exhibits some semblance of a rectangular pattern as Put- 
nam describes. These factors can explain why the Rayleigh curve 
does not model the data well. Other explanations are given in the 
next section. 
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Thus far it has been shown that the Rayleigh curve is not 
responsible for the accurate predictions of the time to accep- 
tance testing (ta) obtained by Basili and Zelkowitz and by Mapp. 
Since the Rayleigh curve did not fit the total weekly effort data 
well, it cannot be said that the Rayleigh curve is an adequate 
model for ' this data. However, there are some explanations for 
why the Rayleigh curve does not adequately fit the data (the 
effect of holidays on the effort distribution of small scale 
software projects). Also, there is some suggestion that the 
effort distributions for ^successful projects* do follow a pat- 
tern similar to the Rayleigh curve (a rise, peaking and decline). 
Therefore, further investigation of the Rayleigh curve may prove 
helpful in determining some of the characteristics of the effort 
distribution for medium scale software projects. 
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5. SUBCYCLE DATA ANALYSIS 

In this section, reasons Cor the Rayleigh curve's failure to 
nodel adequately the total weekly effort data are investigated. 
Specifically, the effort distribution over individual subcycles 
is studied to gain further insight into the behavior of nan-power 
distribution curves. 

Four assumptions have been made in previous research. Based 

A 

on Putnam's claims for large projects, it was assumed that the 
subcycles of design, code and test are Rayleigh in shape and that 
their sum is also Rayleigh* It was also assumed that the data 
used by Putnam and the data gathered at SEL differed only in two 
respects: the size o£*the projects studied and the phases of the 
life-cycle effort for wh;pch data was gathered. In addition, it 

4 

was assumed that the subcycles for medium scale projects were 
distributed in the same fashion as large projects and that the 
effect of adding these subcycles would result in a similar total 
effort distribution. Finally, it was implicitly assumed that the 
manner in which large projects are developed is similar to the 
development of medium scale systems. These assumptions are 
checked to determine the adequacy of the Rayleigh curve as a 
model for this environment. 

The subcycle effort distributions are studied to determine 
whether or not these are Rayleigh in shape. Differences between 
the two sets of data are studied further to see if the Rayleigh 
equation is being applied to the type of effort data it was 
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intended to model. The e££ect of holidays and o£ relative mile- 
stones are examined to determine how the summing of subcycle 
ef£ort distributions for medium scale systems is different than 
that for large scale systems. And, general trends in the subcycle 
effort distributions are studied in order understand the dynamics 
of medium scale system development. Possible explanations for the 
model's failure are set forth. 

5.1 Curve Fitting and Smoothing of Subcycle Effort 

Basil! and Zelkowitz initially hypothesized that since the SEh 

data only included the effort from design through acceptance 

• 

testing at least the central portion of the Rayleigh curve should 
serve as an adequate model for effort distribution. They based 
this hypothesis on Putnam's claim that for large scale software 
systems the design/code and testing subcycles are Rayleigh in 
shape and their sum is also Rayleigh* The hypothesis is tested 
here in light of this underlying assumption. The design, develop 
(code) and test subcycles are smoothed and fitted with the Ray- 
leigh curve to determine how well the Rayleigh curve models 
effort distribution over the subcycle. 

The time spent each week on design, coding and test were 
calculated using the Component Status Report (CSR) data. This 
data was then smoothed and plotted. Because of the large volumes 
of the plots resulting for the seven projects used all of them 
could not be included here. Some sample plots are given in Appen- 
dix A. The best fitting curve for all three phases of each pro- 
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ject was £ound using the linear least squares method used by 
Mapp. Mapp had noticed that the curves obtained using; this method 
were not the best fitting curves. This was due to the fact that 
when the data is linearized it is distorted. This distortion is 
made more pronounced when there is a large variance in the magni- 
tudes of the data being fitted. This was not the case for the 
subcycle effort data. Therefore this method was regarded as ade- 
quate for this application. The results of the carve fits and 
the analysis of runs are given in Table 3. 
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NOTE: Assuming a normal distribution the probability that 

the value of Z is less than -1 is .16 (i.e., 

P(Z < -1) « .16) . 


Table 3. Curve Fitting Results for Design, Coding and Test 
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The analyses of runs indicate deviations of the data from 
the Rayleigh curve. Visual inspection of the plots showed that 

none of the subcycle effort distributions followed a Rayleigh 

0 

curve with the possible exception of the coding effort data. How- 
ever* the rayleigh curve that best fit the coding data did not go 
through the origin. Since the form of the Rayleigh curve used to 
fit the data had to go through the origin it did not result in 
the best fitting Rayleigh curve. Therefore* to improve the fits 
obtained for the coding phase* a coordinate translation was per- 
formed and the resulting data was fitted using the same method. 
Table 4 presents the parameters of the resulting Rayleigh equa- 
tions and the analysis of runs for five of the projects. Mo coor- 
dinate translation was done for project 4 because it was not 
necessary. The fits were not available for project 7. Figures 11 
through 14 give the plots for two of the projects which exhibited 
the closest fit. 
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Except for project 1, the fits for the coding phase curve 
were improved by the coordinate translation. This is because the 
translation used for project 1 cut off the beginning portion of 
the man-powe^ curve. Overall the curves fit the data considerably 
better after translation. 

From this analysis it is seen that none of the subcycles can 
truly be said to follow a Rayleigh man-power distribution with 
the exception of the coding subcycle. The assumption that the 
design, code and test subcycles are Rayleigh in shape is not 
true. Therefore it is unlikely that the total weekly effort would 
assume a Rayleigh shape either. But, the fact that the code 
effort is approximated by the Rayleigh curve is significant and 
cannot be ignored. 

5.2 Comparison of Putnam and SEL data 

In this subsection, the difference between the data studied by 
Putnam and that used for this study are looked at more closely. 
The actual data used by Putnam was not available for the purposes 
of this study. However, this data was not needed to conduct this 
comparison. The purpose of this comparison is to determine what 
type of data was included in Putnam's study. This information can 
be gotten from the literature. 

As has been mentioned earlier, thers are two types of weekly 
effort which are reported in the BEL environment. What was stu- 
died in the previous section was called the total weekly effort 
and consisted of the total effort expended on the project by all 
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personnel. This data is gathered in the RS form. When the data 
from the CSR form is totaled by week what is obtained is the 
total effort expended in development activities. This is given 
the name total weekly development effort here. 

In Putnam's life cycle diagrams (figure 15)# management 
effort ^ is given a separate curve. SEL's total weekly effort 
includes management effort and other overhead charges, whereas 
the total weekly development effort does not. Therefore the total 
weekly development effort data more closely matches the effort 
data regarded by Putnam to properly belong to the subcycles of 
design/code and testing. When the total weekly development effort 
data was smoothed and fitted, the Rayleigh curve fit the weekly 
development effort better than it did the total weekly effort. 
(Examples can be seen in Appendix A in graphs labeled WEEKLY 
CHARGES) . This perhaps indicates that the Rayleigh curve should 
be regarded as a model for development effort and not as an esti- 
mate for the budget type data which makes up the total weekly 
effort. The fits obtained were still not very good however, and 
no definite conclusion can be drawn. Furthermore this observation 
is confined to the SEL environment. 


Figure 15. Putnam's Life-Cycle Diagram 
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5.3 General Trends in the Total Weekly and Development Distribu- 
tions 

In this subsection, the effect that holidays have on subcycle 
effort distribution and how these relate to the disturbances 
observed in the total weekly effort distribution are studied. 
Also, the effect of changing the relative start times of each 
phase of a project are studied in light of how shifting relative 
start times influences the shape of the overall profile curve. 
What is being sought is possible factors which may cause a 
project's effort distribution not to be Rayleigh. 

It is assumed that the forces acting on the total weekly 
effort distribution are similar to the forces acting on the total 
weekly development effort. Therefore studying the subcycle data 
will serve to explain both observations made about the total 
weekly effort as well as the total development effort. 

The subcycle data was first studied to see what kind of 
effect holidays had on effort expended. It was found that the 
occurrence of holidays could not be associated with any of the 
major effort slow downs observed in the individual subcycle data. 
When the development effort was inspected as a sum (that is, when 
these subcycles were summed together), the holidays could be seen 
to correspond to all major slow downs. In other words, the holi- 
days did not cause any noticeable noise at the subcycle level, 
but the cumulative effect of adding the subcycles to obtain the 
total weekly development effort made them apparent. This 
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corresponds to the observation made earlier of the effects of 
holidays on total weekly effort. Other noise in the data seems to 
cancel out because it occurs randomly. For example, not everybody 
gets sick at the same time. The noise due to holidays is not ran- 
dom - most people like to take vacations around holidays. There- 
fore, the effects do not cancel but are reinforced when the sub- 
cycles are added together. 

It can be said that the reason holidays have such an impact 
on the total weekly effort is because of the cumulative effects 
introduced by these non-random disturbances. For large scale sys- 
tems these disturbances are insignificant and therefore are not 
noticeable. This gives one key as to why the total weekly effort 
for medium scale systems may not be Rayleigh in shape. 

Another reason can be given by making some observations 
about the relative start dates of each phase of a project. Putnam 
observed that for large projects the relative dates for mile- 
stones (the start dates for different phases of a project) were 
similar. What would be the effect of changing the project mile- 
stones in the overall curve? 

To answer this question the sum of pairs of Rayleigh curves 
were considered. They are illustrated in figures 16 through 18. 
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Depending on how these curves were shifted, illustrating the 
changing of milestone dates, the sum of these curves varied. The 
resulting sums were a curve which looked like a Rayleigh curve 
with some "noise" (figure 16) , a curve which has two distinct 
peaks (figure 17) , and a curve which could be modeled by a para- 
bola if its tail end were eliminated (figure 18). 

It follows therefore that one of the reasons that large 
scale projects have Rayleigh shaped effort distributions is 
because of the particular arrangement of milestones among this 
size system. One of the reasons that the smaller projects studied 
here do not exhibit a Rayleigh shape is because of the differ- 
ences in relative milestone dates. Further evidence for this 
difference lies in Putnam's observation about the time to reach 
peak effort. Putnam has indicated (*) that for smaller projects 
the time to peak effort is half the titoe it takes to complete 
development. This is unlike large projects where peak effort 
occurs at the end of development. This suggests that the relative 
start dates for each phase in a small project are different than 
those for a large project. 

These two factors — holidays and shifting milestones, affect 
the shape of the overall effort distribution because of their 
cumulative effects on the project profile curve. These observa- 
tions help give some explanation about why the Rayleigh curve may 
not model the total weekly effort distribution for projects with 

* This information was given to the SCL in a private 
communication. 
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Rayleigh shaped subcycle e££ort distributions. However, the pro- 
jects studied here do not have Rayleigh shaped subcycle curves. 
Explanations for this latter phenomena are sought in the next 
subsection. 

5.4 General Trends in the Subcycle Effort Distribution 

Thus far it has been seen that the subcycle man-power for design 
and testing were not Rayleigh in shape. Only the coding man-power 
curve seemed to be modeled by the Rayleigh curve. This has helped 
explain why the overall project curves were not Rayleigh but it 
still leaves us with the questions why are not all three of these 
subcycle curves Rayleigh in shape as Putnam proposed and why is 
it that the coding man-power curve seems to be Rayleigh? 

We can explain these phenomena by taking a closer look at 
the SEL environment itself. Basili and Beane pointed out that the 
SEL environment was not typical because of the contractor's inti- 
mate familiarity with the problem area and because of the simi- 
larity of the programs. This has a great deal of bearing on the 
shape of the man-power curves. First we must point out what is so 
unique about this environment however. 

At the SEL, managers use a heuristic algorithm (this algo- 
rithm was given earlier) . When Basili and Beane examined this 
algorithm they found that managers were indeed making use of it. 
What is unusual is that managers could seemingly apply personnel 
at any point of the project without having any major adverse 
effects on the development process. This is because the projects 
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ware so similar in nature and the personnel was so familiar with 
the application area that it did not take them very long to come 
up to speed on a new project. Their learning curve was signifi- 
cantly reduced. This is not a typical situation. 

The significance of this is the effect this has on project 
development and as a consequence the effect it has on man-power 
curves. Seeing that the SEL environment is very different than 
those studied by Putnam, it does not come as a surprise that the 
effort distribution curves do not seem to match. We could leave 
our explanation at that were it not for the man-power curve for 
the coding subcycle. The coding curve seems to be modeled well by 
the Rayleigh curve. This needs to be examined further. 

Because of the contractor's familiarity with the application 
area we would expect that the effort expended on new projects 
would be considerably less than if the problem area were unfami- 
liar to the people working on the project. Furthermore, we would 
also expect that the effort expended would be applied optimally 
or nearly so since they had done this sort of thing before. 

The fact that the problem space is reduced significantly 
impacts the design effort since it is in the design phase that 
many of the problems need to be solved. In the SEL environment 
many of the problems are solved even before the project begins. 
This is what allows management to allocate as much as 1/2 to 3/4 
"full staffing" at the very beginning of the project. This is 
considerably different from what is suggested by the Rayleigh 
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curve. The Rayleigh curve suggests that the design curve should 
go through the origin. Not only is there a shifting of the roan- 
power curve but also the very shape of the man-power curve is 
affected. Problems will be handled in a very different manner. 
More problems will be done in parallel. Also the learning curve 
which is what determines the man-power curve is almost non- 
existent. These factors cause the roan-power curve to exhibit a 
different shape. The design curve in the SEL environment is not 
Rayleigh because the environment is significantly different. 

Unlike the design man-power curve, the man-power curves for 
coding and testing should not be as affected by these particular 
differences in environment. This is because the coding and test- 
ing phases are not as greatly affected by the personnel's fami- 
liarity with the problem area except perhaps in doing things more 
efficiently. The basic problem of coding from a design specifi- 
cation remains unchanged. Fseejpi for a possible reduction of some 
types of coding errors and lifting some code directly from a pre- 
vious project the problems encountered will be the same and there 
will be just as many of them. Modifying code, by the contractor's 
own account, is difficult and brings its own set of problems so 
no great advantage is gained by this. Because the problem set is 
not significantly changed, the problem of coding - translating a 
design specification into a programming language, remains 
unchanged. This of course means that there will be little if no 
impact on the overall shape of the roan-power curve, except it 
might reach peak effort at an earlier date. Furthermore since the 
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contractor has a lot of experience with the application his 
effort will be expended in almost optimal manner. This means that 
the man-power curve £or the coding subcycle will be very close to 
the optimal man-power curve for coding for medium scale systems. ^ 
Since this curve seems to be Rayleigh in shape, this suggests 
that the Rayleigh curve might be the optimal manloading curve for 
coding in a typical environment. This agrees with Putnam's obser- 
vations. 

We are still left with one problem. The testing man-power 
curve is not Rayleigh in shape. Also the factors causing the 
design curve to be different do not have a significant impact on 
the testing curve for the same reasons that they did not influ- 
ence the coding curve. But, it was noted previously that the 
testing subcycle was most likely made up of two phases: module 

testing and integration testing. The man-power curves for these 
two phases taken separately may very well be Rayleigh in shape. 
There is no way of telling whether this is true or not from the 
SEL data however. 

The implications of these results are important even though 
they may not b£ conclusive. The Rayleigh model continues to be a 
good Candidate man-power curve for medium scale environments. It 
must be noted that Parr has offered a different explanation for 
why the design curve does not go through the origin. The Parr 
model therefore cannot be discarded as a possible candidate 
either. Further work is definitely warranted. 
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6* USE OF SOFTWARE PARAMETERS TO CLASSIFY SYSTEMS 

Putnam has derived relations for difficulty, system size and a 
measure of the state of technology. In this section, the possi- 
bility of using these relations to classify medium scale projects 
are studied. 

Putnam observed that for large projects the relation 
D*K/td**2, acted as a measure of difficulty in terras of the pro- 
gramming effort and the time to produce the system. He also 

derived a relation between the number of source lines of code, 

» 

the effort and the time to produce it. This is given by the equa- 
tion: Ss*Ck* (K**l/3)*(td*M/3) , where Ss is the number of source 

lines, K is the total effort, td is the time to reach peak effort 
and Ck is the state of technology. Putnam observes that Ck "seems 
to relate to machine throughput (or programmer turnaround, avail- 
able test time, etc.) and other technological improvements like 
Chief Programmer Team, Top Down Structured programming, on-line 
interactive job submission, etc." Since the data studied by Put- 
nam differs from the data studied here, these relations cannot be 
applied directly. A new set of equations is derived using 
Putnam's techniques. 

The fundamental difference that must be considered is that 
SEL data includes effort expended through acceptance testing 
only, while the data studied by Putnam includes the entire life- 
cycle through maintenance. Because of this the time to reach peak 
effort reported in the SEL data does not correspond to td, the 
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time to development, nor does the total *H'fort reported in the 
SEL data correspond to K, the total ef£ort through maintenance. 
The time to reach peak effort reported in SEL. is called tm. The 
total effort reported in the SEL data through acceptance testing, 
is called Ka. tm and Ka are substituted for td and K 

The resulting equations are identical in form to those given 
by Putnam. The shaping parameter is given by l/(2*tm**2). 
Putnam's difficulty parameter D, given by 2*K*a, is replaced by 
the expression Dl«2*Ka*al, where al*l/( 2*tm**2) . The equation for 
number of source lines is given by, Ssel = Csel-Dl**2/3-Ka where 
Ssel and Csel replace Ss and Cn respectively (*) . Note that 
although these equations are of the same form as Putnam's equa- 
tions they have significantly different meaning. 

These equations were applied to the SEL data. D1 was calcu- 
lated using the parameters of the curves fitted to total weekly 
effort, total weekly development effort and total weekly effort 
spent on coding. The values obtained for D1 are compared for con- 
sistency with a subjective measure of difficulty. This measure is 
given by management. Projects are ranked according to these meas- 
ures and the two rankings are then compared. Table 5 summarizes 
the values of D1 and Table 6 shows a rank ordering of projects 
according to Dl. The values of D1 were obtained by using the 
values for the total effort and the values of the shaping parame- 
ters were obtained from the least squares fit of the effort data. 

* Ss - Cn*(D**2/3)*K 


Ss » Ck*(K**l/3)*(td*M/3) 


A rank of one (1) is given to the project with lowest value of 
Dl, and a value of five (5) to the highest value. 


ORIGINAL PAGE fJ 
OF POOR QUALITY 



{ D1 FOR I 

1 1 
I TOTAL EFFORT | 

1 i 

1 (CALCULATED) I 
1 1 

D1 FOR 

TOTAL EFFORT 
(SEARCHED) 

D1 FOR 
DEVELOPMENT 
EFFORT 

| Dl FOR 

| CODING 
1 

| EFFORT 

PROJ 

1 

1 | 
1 
1 

1 

27.7 | 

1 
I 

19.4 

17.6 

1 17.6 

1 
1 

PROJ 

1 

2 I 
1 

18.2 | 
1 

19.6 

14.0 

1 

1 14.0 

1 
1 

PROJ 

1 

3 I 
1 

1 

l 

15.9 1 

1 

14.9 

6.73 

1 

1 

1 S.73 

1 

PROJ 

\ 

4 1 

1 

1 

27.6 | 

1 
1 

19.4 

12.1 

1 

1 12.1 

S 

■ 

• 

PROJ 

1 

5 I 
1 

1 

33.2 | 

1 
1 


24.2 

J 

1 

1 24.2 

1 


Table 5. Results of Calculations of the Difficulty Parameter 
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Table 6. Ranking According to D1 Parameter 
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The projects were evaluated by management using 42 
categories of difficulty which were divided into three groupings: 
complexity, internal and external influences. These all give 
some indication of how difficult it was to develop a project. For 
each project a value from zero (lowest difficulty) to fifty 
(highest difficulty) was assigned to each category. The values 
for the categories under each grouping were added and totals for 
each project were obtained. The projects were then ranked in the 
same way as with Dl. The rankings are compared in Table 7. 


least 


g reatest 


D1 

PROJ 3 

4 
2 
1 

5 


Difficulty 

PROJ 5 
2 

3 
1 

4 


- ■ - V 

Table 7. Comparison "of Management * s Ranking of Projects 

According to Difficulty and Ranking Obtained from D1 


As can be seen there is no correspondence between management's 
perception of difficulty and Dl. Comparing each one of the group- 
ings of difficulty factors: complexity, external and internal 
influences, separately does not improve the results. 
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The next effort was to determine the value of the constant 
Csel for each project to see whether projects could somehow be 
ordered according to technology or methodology. Management 
appraisals of the methodology were used as a basis of comparison. 

The equation for Ssel is used. The number of source lines 
was defined as the total number of source lines including com- 
ments. The values of D1 were taken from Table 5. [BB2J Table 8 
summarizes the results. The values used for Ka and D1 are 
obtained from Table 5. 

i 

The systems were ranked from lowest to highest value of 
Csel. Management's evaluation of the methodology employed on 
these projects was also used to rank the projects. All these 
values are summarized in Table 9. 
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Table 8. Calculations of the Methodology Parameter Csel 
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Total Weekly 
Ef£ort Calc 


Total Weekly 
Effort Searched 
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Table 9. Ranking projects According to Methodology Using Csel 

ans Management Estimates 
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The projects were ranked, using the same ranking scheme as 
above. The rankings are compared in Table 10. 


least 


greatest 


Csel 


PROJ 3 
2 

4 
1 

5 


Methodology 


PROJ 3 
2 

4 
1 

5 


I 1 I 

Table 10. Comparison o£ Management's Ranking o£ Projects 
According to Methodology and Ranking Obtained 


nkf a i na/l f 






It is seen that the ranking obtained using Csel seems to match 
the ranking given by management. It is not clear whether or not 
Csel is a product of the Rayleigh model however. Like the esti- 
mates for ta , yd , and Ka, Csel is somewhat suspect. The link to 
the Rayleigh model is made through the difficulty measure Dl. 
Csel relates to this measure and to total effort. Dl is not a 
very good measure for difficulty at SEL and therefore provides a 
weak link between the number of source lines and technology. The 
constant Csel used by Putnam arose empirically when productivity 
was plotted against difficulty. This seems like a rather loose 
connection with the Rayleigh curve. Therefore it is not clear 
whether or not the ranking can be credited to the use of the Ray- 
leigh model. However, even though there does not seem to be any 
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theoretical support for these results at present, it does not 
reduce their significance. These equations can still be used to 
give management a quick estimate of how well things were done on 
a project in rankings which correspond to rankings he would have 
given . 


From these results it cannot be decided whether or not the 
Rayleigh model itself could be used to classify medium scale pro- 
jects according to difficulty or methodology. If there were a way 
in which to estimate the constant Csel a priori, either by use of 
historical data or some evaluation method, it may be possible to 
estimate the number of source lines by using the equation for 
Ssel. This was not attempted because there were not a sufficient 
number of projects to estimate a value for Csel. How the Rayleigh 
model can be used to classify or size medium scale systems is 
uncertain, at least for the S£L environment. 
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7. ANALYSIS OF COMPONENT DATA 

In this last section, attention is turned away from man-power 
distribution to other relations in the data. The object is two- 
fold. To try to find invariants which may allow the data to be 
smoothed and analyzed further, and, to find relations which will 
facilitate the understanding of effort distributions. 

Attention is centered around the relations between com- 
ponents and effort. It was reasoned that once the requirements 
had been defined, the problem of determining the number of com- 
ponents would be much more tractable than estimating the number 
of source lines of code. If a relation between the number of com- 
ponents and effort could be established, it would make the prob- 
lem of estimation much simpler than the traditional approach of 
estimating lines of code. 

The relation between the number of components and effort was 
studied to determine whether or not the effort distribution could 
be obtained by determining the number of components worked on. 
To do this the total number of components in existence in the 
system in any given week was gotten from the CSR form and plot- 
ted. Components are defined as any named portion of the system. 
The weekly ratio of components worked on in a given week to the 
total number of components in existence that week was also plot- 
ted along with the ratio between the number of hours worked in a 
given week to the number of components worked on that week. The 
ratios were computed for each week. Multiplying each of the 
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ratios gives the weekly total effort. 

effort/week ■ (existing components/week) 

X (components worked on/existing components) /week 
X (effort/ components worked on) /week 

These ratios were fitted with six different two parameter curves. 
Table 11 gives the parameters for the best fitting curves for 
existing components/week and the corresponding correlation for 
four of the projects. (The fits could not be performed on the 
other three projects.) Table 12 and 13 give the same information 
for the other two ratios. 


- 75 - ORIGINAL PAGE IS 
OF POOR QUALITY 



1 

PROJ 1 

1 

PROJ 2 

I PROJ 3 

1 PROJ 5 


jy • a b*x 

a r 

3.58 


12.5 

1 -33.9 

1 2.41 



bl 

2.51 

1 

0.66 

1 0.46 

| 0.652 



r 1 
_l_ 

6 . 52e~2 

1 

0.91 

| 0.94 
1 

I 0.96 


1 y«a*exp(b*x) 

a I 

3.40 


9 . 73e-3 

I 9.38 

"TV. 78 



b | 

1.66 

1 

15.8 

I 7 . 23e-3 

I 1. 53e-2 



r | 
1 

.03 

1 

-1. 

.70 

1 .94 

1 .78 
1 


1 y«a» (x**b) 

a| 

T77I 



1 4.52 

T~Z775 



b| 

-3.0e-2 

1 

.799 

I .483 

1 .682 



r 1 

1 . 61e-3 

1 

J. 

.81 

1 .40 

I .82 
1 


1 y»a+(b/x) 

af 

8.31 


“128 

|“B2.'7 

1 ^3.3 



b | 

.339 


-13.1 

1 -7.05 

1 -7.35 



r 1 
_l_ 

.001 

1 

J. 

.068 

1 .028 
1 

1 .097 


1 y»l/(a+b*x) 

a I 

*292 


.104 

I .065 

1 .137 

"i 

i 


b| 

X • 34e-4 

1 

-4 . le-4 

j -1.6e-4 

| -8 . 6e-4 



r 1 
1 

.003 

1 

J. 

.29 

1 .88 
1 

1 .34 
1 


| y«x/ (a+b*x) 

— r 

-2.4e-2 


4.8e-2 

1 5.3e-l 

1 4.?e-2 



b| 

.323 

1 

.024 

1 .024 

1 .031 



r 1 

.011 

1 

1 

.75 

1 .11 

1 .81 
1 



Table 11. Total Number Existing Components in the System 
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Table 12. Components Worked On/ Existing Components 


Ip-maui 











ORIGINAL PAGE 13 
OF POOR QUALITY 



1 

PROJ X~ 

| PROJ 2 

| PROJ 3” 

l 

| PROJ 5 


jy * a + b*x 

"~al” 

3.58 

| 6.96 

| 25.0 

| 12.4 


b 1 

.025 

1 .006 

l -4.75 

j 4. 33 



rl 

1 

.006 

| .002 

1 .31 

| .002 


j y«a*exp(b*x) 

a| 

bl 

3.40 

.002 

1" 6.1 5 ~ 

I 5.29 

| 24 ."5 
1 -.0036 

1 13.0 
| -.0003 



rl 

1 

.004 

l .007 
1 

I .30 
1 

I .001 


| y»a* (x**b) 

a| 

5.41 

1 3.17 

I 17.4 

1 21.5 


b| 

-.030 

t .159 

| —.11 

| -.030 



rl 

l 

.002 

I .11 

| .026 

l .070 


1 y*a+ (b/xj 

a| 

T7TT 

ror 

“| 13*2 

1 16.6 


b| 

.339 

1 -.567 

j -.88 

| 1.06 


1 

rl 

1 

.002 

I .045 
1 

I .013 
1 

| .011 
1 


1 y«l/(a+b*x) 

_|. 

.292 

| .203 

1 .033 

| .086 

l 

b| 

.0013 

| -5.62 

| .0004 

I .0001 



rl 

1 

.002 

| .0008 
1 

1 *13 
1 

1 .014 


1 y«x/ (a+b*x) 

a| 

-.024 

| 2.22 

I .006 

-6.34 


b| 

.323 

I .189 

I .141 

{ 9.92 



rl 

1 

.011 

| .026 
1 

I .065 

1 .026 
1 

MW 


Table 13. Effort/ Components Worked On 
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The best fit was given by the straight line to existing com- 
ponents. Existing components was also fit well by the exponen- 
tial curve. The ratio of components worked on to existing com- 
ponents was best fit by the exponential and secondly by the line. 
The ratio between effort and the number of components was not fit 
well by any of the curve types, visual inspection of the plots of 
this ratio for each of the subcycles suggested * he possibility 
that the ratio was constant during the coding phase. This obser- 
vation was not substantiated by curve fitting however. 

If what seems to be the best curve types are multiplied as 
was illustrated above, the following is obtained. 

effort/week ■ (al+bl+t) * (a2*exp(b2*t) ) *a3 

al, a2 and a3 are constants resulting from the fits. This equa- 
tion can be rewritten in the form, 

effort/week - (Cl+C2*t)*(exp(C3*t) ) 

As can be seen this equation differs from the Rayleigh equation. 
The expression in the exponential is a function of time whereas 
in the Rayleigh equation it is a function of time squared. If the 
resulting equation had had the same form as the Rayleigh equa- 
tion, it would have lent some support to the model. However, 
nothing can be said about the rayleigh model from these results. 

The fact that the plot of total number of existing com- 
ponents, was best fit by the straight line suggests that for this 
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environment, the system being developed grows by a constant 
number of components. The relation that is observed can be sum- 
marized as follows: the limit of the difference between the total 
number of components in week i and the total number of components 
In week i-1 as i approaches td, the time to the end of develop- 
ment, is constant. 

Other relations studied were total cumulative effort, the 
number of components worked on for a given week and the ratio or 
effort to the number of components in existence in the system. 
None of these relations proved very useful. The distribution of 
the number of components worked on in a given week does suggest a 
Rayleigh shape but there is too much noise in this data to be 
certain even after smoothing. The plots for all the ratios for 
one of the projects studied are given in Appendix B. 

There do not seem to be any relations in the data which can 
be used in smoothing. Other than the total number of components 
in existence none of the other relations could be fit very well 
by any of the curve types. As far as gaining any further insight 
into the behavior of man-power not much can be said. There does 
not seem to be any obvious relation between the number of com- 
ponents worked on and the effort expended. 
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8. CONCLUSION 

It is clear that the Rayleigh curve is not the best fitting curve 
for the effort data in the SEL environment. Because of the 
contractor's familiarity with the problem area, a unique develop- 
ment environment exists which varies significantly from the 
environments studied by Putnam. Therefore it is natural that the 
Rayleigh model would not be adequate for this environment. The 
SEL environment differs principally in the design and testing 
phases. It is clear that much of the effort which would normally 
be required during the design phase is eliminated because of the 
contractor's knowledge of the problem area. The testing effort 
curve is different only in how testing effort is accounted for 
and how the time schedule for testing differs. Testing and accep- 
tance testing are done as two distinct phases. If the effort data 
was collected as two different phases, it is very possible that 
each phase would exhibit a Rayleigh man-power distribution. The 
addition of the two curves would not necessarily result in a 
Rayleigh curve as was illustrated in figure 17. 

The coding phase for the SEL environment seems to follow the 
Rayleigh curve closely. It may be that the coding phase is less 
affected by the contractor's familiarity with the problem area. 
Added experience may aid programmers in finding more efficient 
ways of implementing a particular design and in reducing the 
total amount of time spent on developing the code, but still not 
change the basic shape of the curve because the problem of coding 
is not really changed. Even with the added knowledge there is an 
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upper limit to how fast code can be produced. In many cases the 
programmer may depend on another module to be completely coded. 
Other problems which are unique from project t© project may still 
need to be solved. Whereas many design problems were eliminated 
because the design already existed, coding specific solutions may 
be different enough from previous solutions that the code has to 
be redone or significantly modified. 

Since the Rayleigh curve fits this data well and since it is 
possible that the environmental differences did not cause signi- 
ficant deviations from a typical man-power distribution, this 
makes the Rayleigh model at least a possible candidate model for 
‘‘'.2 man-power distribution of medium scale projects. However 
there are at least four factors which have been studied that 
affect the overall weekly effort distribution for medium scale 
projects which must be taken into account. 

First, the underlying subcycles of design, code and test may 
not all be Rayleigh in shape due to differences in the develop- 
ment environment. This was definitely the case for the SEL 
environment. Differences in management strategies can also cause 
significant deviations. This was observed in the testing phase 
and it also has been pointed out by Putnam. Second, the effort 
data gathered may not be that which the Rayleigh model is 
intended to model. The data should match as closely as possible 
the type of data used to formulate the model. Before attempting 
to apply the model one must carefully consider what the model is 
intended to model. Thirdly, the effect of holidays on medium 
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seals system's total effort distribution is much more pronounced 
than on large scale systems. This may be as a result of the 
differences in granularity of the data collected for small to 
medium scale systems versus large scale systems (weekly vs* 
monthly or yearly). These predictable disturbances must be taken 
into account in small to medium scale proj^ts while they might 
be ignored for large projects. Finally, the difference in the 
relative dates for the start <rf various phases between projects 
may vary significantly. The ideal project phasing has not been 
thoroughly worked out. One solution for all projects most prob- 
ably does not exist. Nevertheless, the timing of these mile- 
stones is under the control of management. 

All these factors must be taken into account in any future 
studies of this model's applicability to medium scale projects.' 
The environment studied is rather unusual and somewhat 
unrepresentative of a large section of the industry. The data 
used contains a lot of noise. Also, the smoothing techniques used 
may have caused some tacts to be overlooked. 

The optimum man-power distribution curve for a typical 
development situation is not necessarily the optimal man-power 
solution for all situations. This is strongly suggested by the 
environment studied in the SEL. Furthermore, there are not really 
that many "typical" environments. The notion of what is "typical" 
is very hard to define. Putnam has tried to define the average 
behavior of man-power curves. Individual deviations will always 
exist. However, it is felt that the model does show promise not 
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